The 10 longest words with frequency>1, ordered by length
Length | Frequency | Word |
---|---|---|
88 | 2 | 진주오피스텔┃진주올전세┃진주리모델링주택┃진주칠암동원룸┃진주상봉동원룸┃진주아파트┃진주흥한건설┃진주투룸올전세┃진주신축건물┃진주칠암동투룸┃하대동투룸┃진주올전세쓰리룸 |
85 | 2 | 77명→63명→75명→64명→73명→75명→114명→69명→54명→72명→58명→98명→91명→84명→110명→47명→73명→91명→76명→58명→91명 |
35 | 14 | http://www.leica-microsystems.co.kr |
34 | 3 | http://www.catholichospital.co.kr/ |
33 | 6 | 부분이있어요건강검진,예방접종,성형수술,치과진료,의족,렌즈삽입 |
33 | 2 | 홍익대,마이크로프로세서,마이크로,프로세서,마프,실험42019 |
32 | 2 | ★무사고★완무★보험이력0원★차량병적관리★소소한드레스업★완전 |
30 | 5 | 대장점막내암,유방,자궁체부자궁경부,전립선,방광암에대해서 |
30 | 2 | http://xn--ij2bx6j77bo2kdi289c |
29 | 6 | http://www.saehospital.co.kr/ |
The longest words of the corpus with minimum frequency 2 are shown. The words are seen at least twice, hence, there is some chance for not seeing misprinted words.
Surprisingly, there is no longest word which is much longer than the second one. This, again, argues for correct preprocessing.
In the case of correct preprocessing, the longest words are true words. In many cases, they belong to some topics which can generate these long words.
In the case of poor preprocessing, some non-word strings will appear.
The length of the longest words clearly depends on language and corpus size.
select char_length(word) as le, freq, word from words where freq>1 order by le desc limit 10;
How does the length of the longest words increase with corpus size?
3.2.3.1 Longest Words in top-1000 by length